Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a CUDA device code sanity check #4692

Draft
wants to merge 9 commits into
base: 5.0.x
Choose a base branch
from

Conversation

jfgrimm
Copy link
Member

@jfgrimm jfgrimm commented Oct 24, 2024

At the moment, we do no checking that the cuda compute capabilities that EasyBuild is configured to use, are actually used in the resultant binaries/libraries

WIP PR to introduce an extra sanity check when CUDA is present to check for mismatches between cuda_compute_capabilities and what cuobjdump reports

@jfgrimm jfgrimm added this to the 5.0 milestone Oct 24, 2024
from easybuild.tools.systemtools import get_shared_lib_ext, pick_system_specific_value, use_group
from easybuild.tools.systemtools import check_linked_shared_libs, det_parallelism, get_cuda_device_code_architectures
from easybuild.tools.systemtools import get_linked_libs_raw, get_shared_lib_ext, pick_system_specific_value, use_group
from easybuild.tools.toolchain.toolchain import TOOLCHAIN_CAPABILITY_CUDA
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'easybuild.tools.toolchain.toolchain.TOOLCHAIN_CAPABILITY_CUDA' imported but unused

@ocaisa
Copy link
Member

ocaisa commented Oct 24, 2024

It's great that you looked into this, we've also been discussing it in EESSI: https://gitlab.com/eessi/support/-/issues/92

@jfgrimm
Copy link
Member Author

jfgrimm commented Oct 24, 2024

@ocaisa thanks for the link, I'll take a look

Currently, main things I still plan to add to this pr:

  • An EB option to toggle whether this is a warning or error (akin to rpath sanity check strictness)
  • whitelisting (e.g. for bundled precompiled stuff)
  • handling software that only allows targeting a single CCC

@ocaisa
Copy link
Member

ocaisa commented Oct 24, 2024

I think it's a good idea to check for device code and ptx (with lack of ptx for the highest compute capability being a warning). The availability of ptx will allow you to run the application on future arch's.

"""

# cudaobjdump uses the sm_XY format
device_code_regex = re.compile('(?<=arch = sm_)([0-9])([0-9]+a{0,1})')
Copy link
Member

@ocaisa ocaisa Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also capture whether the code can be jit compiled (so it can at least run on a future arch). In a script I had I did this with:

# Regex to find multiple PTX and ELF sections
        ptx_matches = re.findall(r'Fatbin ptx code:\n=+\narch = sm_(\d+)', result.stdout)
        elf_matches = re.findall(r'Fatbin elf code:\n=+\narch = sm_(\d+)', result.stdout)

        # Debug: Show if matches were found for PTX and ELF sections
        if debug:
            print(f"PTX Matches: {ptx_matches}")
            print(f"ELF Matches: {elf_matches}")

        # Return all PTX and ELF matches, remove duplicates using set and convert to lists
        return {
            "ptx": sorted(set(ptx_matches)),  # List of unique PTX capabilities
            "elf": sorted(set(elf_matches))   # List of unique ELF capabilities
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact: re.compile('(?<=arch = sm_)([0-9])([0-9]+a{0,1})') is not specific enough, because it will treat the Fatbin ptx code and Fatbin elf code sections the same: it'll just extract any arch = string it can find.

To have a concrete example of something that has both, one can check e.g. libcusparse:

[casparl@tcn1 ~]$ cuobjdump /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64/libcusparse.so | grep -A 5 ptx | tail -n 12
================
arch = sm_80
code version = [8,1]
host = linux
compile_size = 64bit
--
Fatbin ptx code:
================
arch = sm_90
code version = [8,1]
host = linux
compile_size = 64bit
[casparl@tcn1 ~]$ cuobjdump /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64/libcusparse.so | grep -A 5 elf | tail -n 12
================
arch = sm_80
code version = [1,7]
host = linux
compile_size = 64bit
--
Fatbin elf code:
================
arch = sm_90
code version = [1,7]
host = linux
compile_size = 64bit

@@ -3900,6 +3955,14 @@ def xs2str(xs):
else:
self.log.debug("Skipping RPATH sanity check")

if get_software_root('CUDA'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel We have an EESSI-specific complication here. We drop CUDA to a build time dep so that we don't depend on the CUDA module at runtime. This means that we won't execute this code path so we need to trigger the module load here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, but just to double check: are the build dependencies unloaded at sanity check time?

Could we fix this through an EasyBuild hook in EESSI, that loads the CUDA that was a build dependency also in the sanity_check_step (and unloads after)? Should also work for EESSI-extend, and no changes on the framework side needed...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... actually, I don't think you're right. Because I just did this with EESSI-extend, and it did run the CUDA sanity check...? I'm not sure why, I would have expected the problem you mentioned. So... why didn't it appear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what happens in the --module-only case, when the sanity check step is being run without building first? Perhaps this really is expected behaviour?

self.log.info("Using default subdirectories for binaries/libraries to verify CUDA device code: %s",
cuda_dirs)
else:
self.log.info("Using default subdirectories for binaries/libraries to verify CUDA device code: %s",
Copy link
Contributor

@casparvl casparvl Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This info message seems wrong: this is not the default subdirectories, this is a custom defined bin_lib_subdirs

@casparvl
Copy link
Contributor

FYI: I checked with @jfgrimm on chat, he probably has little time to work on it in the near future. Since this is a very valuable feature for EESSI that we'd like to have before we start building a large amount of GPU software, I'll try to work on this myself. Note that @jfgrimm was ok in me pushing to his branch, so I'll do that rather than create my own PR - at least we can have the full discussion in one place, namely here.

@casparvl
Copy link
Contributor

casparvl commented Feb 19, 2025

I tested this as follows:

  • cloned Jasper's feature branch into $HOME/easybuild/easybuild-framework/
  • load EESSI and EESSI-extend: module purge && module load EESSI/2023.06 EESSI-extend/2023.06-easybuild
  • installed an EasyBuild from the current 5.0.x branch using the EasyConfig EasyBuild-5.0.x.eb below, using the EasyBuild-4.9.4 from EESSI: eb EasyBuild-5.0.x.eb. This ensures I have the versions of blocks and easyconfigs from 5.0.x.
#EasyBuild-5.0.x.eb
# Nice way of installing an EasyBuild installation from the develop branch...
# Install with 'eblocalinstall --force-download ...' to make sure you get the latest version
easyblock = 'EB_EasyBuildMeta'
name = 'EasyBuild'
version = '5.0.x'
homepage = 'https://easybuilders.github.io/easybuild'
description = """EasyBuild is a software build and installation framework
 written in Python that allows you to install software in a structured,
 repeatable and robust way."""
toolchain = SYSTEM
sources = [
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-framework/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-framework-develop.tar.gz',
    },
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-easyblocks/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-easyblocks-develop.tar.gz',
    },
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-easyconfigs/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-easyconfigs-develop.tar.gz',
    },
]
# order matters a lot, to avoid having dependencies auto-resolved (--no-deps easy_install option doesn't work?)
# EasyBuild is a (set of) Python packages, so it depends on Python
# usually, we want to use the system Python, so no actual Python dependency is listed
allow_system_deps = [('Python', SYS_PYTHON_VERSION)]
local_pyshortver = '.'.join(SYS_PYTHON_VERSION.split('.')[:2])
sanity_check_paths = {
    'files': ['bin/eb'],
    'dirs': ['lib/python%s/site-packages' % local_pyshortver],
}
moduleclass = 'tools'
  • Set the folowing environment variables to pick up on the feature branch:
export PATH=$HOME/easybuild/easybuild-framework/:$PATH
export PYTHONPATH=$HOME/easybuild/easybuild-framework/:$PYTHONPATH
  • Added the following configuration (for some reason, my robot-path was empty, I now make it use the easyconfigs from the 5.0.x I installed above):
export EASYBUILD_ROBOT_PATHS=/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/EasyBuild/5.0.x/easybuild/easyconfigs
export EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.0
  • I tried to install a CUDA-Samples:
eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild

This resulted in

== 2025-02-19 20:55:23,959 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/tools/build_log.py:166 in caller_
info): Sanity check failed: Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/
software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/jitLto. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0.
Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/inlinePTX_nvrtc. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0.
Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/conjugateGradientCudaGraphs. Surplus compute capabilities: 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9, 9.0.

And many more. That's great, it means this PR is actually doing what it should. Indeed, checking manually:

$ cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/jitLto

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit

So, yeah... CUDA-Samples is a mess when it comes to it's build system. The docs say you can set the CUDA compute capabilities by passing the SMS=<something> argument to it. Just for reference, my build command from the logs was:

rm -r bin/win64 &&  make  -j 16 HOST_COMPILER=g++ SMS='80'
 FILTER_OUT='Samples/2_Concepts_and_Techniques/EGLStream_CUDA_Interop/Makefile Samples/2_Concepts_and_Techniques/streamOrderedAllocationIPC/Makefile Samples/3
_CUDA_Features/tf32TensorCoreGemm/Makefile Samples/3_CUDA_Features/warpAggregatedAtomicsCG/Makefile Samples/4_CUDA_Libraries/boxFilterNPP/Makefile Samples/4_C
UDA_Libraries/cannyEdgeDetectorNPP/Makefile Samples/4_CUDA_Libraries/cudaNvSci/Makefile Samples/4_CUDA_Libraries/cudaNvSciNvMedia/Makefile Samples/4_CUDA_Libr
aries/freeImageInteropNPP/Makefile Samples/4_CUDA_Libraries/histEqualizationNPP/Makefile Samples/4_CUDA_Libraries/FilterBorderControlNPP/Makefile Samples/5_Do
main_Specific/simpleGL/Makefile Samples/5_Domain_Specific/simpleVulkan/Makefile Samples/5_Domain_Specific/simpleVulkanMMAP/Makefile Samples/5_Domain_Specific/
vulkanImageCUDA/Makefile Samples/0_Introduction/simpleAWBarrier/Makefile Samples/3_CUDA_Features/bf16TensorCoreGemm/Makefile Samples/3_CUDA_Features/dmmaTenso
rCoreGemm/Makefile Samples/3_CUDA_Features/globalToShmemAsyncCopy/Makefile Samples/4_CUDA_Libraries/simpleCUFFT_callback/Makefile Samples/2_Concepts_and_Techn
iques/cuHook/Makefile ' && rm bin/*/linux/release/lib*.so.*

Note that there are many executables in CUDA-Samples that were build for the correct CC. E.g.:

$ cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/deviceQuery

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
host = linux
compile_size = 64bit

@casparvl
Copy link
Contributor

casparvl commented Feb 19, 2025

Collecting some todo's:

  • add --strict-cuda-sanity-check EB option (default no): regular sanity check would fail (raise an error) if not at least the configured CCs are present. It should report surplus CCs (at least with --debug), but not fail. The strict variant will also fail if there are surplus CCs present. N.B. I'm not in favor of converting the error into a warning here - if you're not getting the CC you're requesting via --cuda-compute-capabilities, that's not what the user is counting on, and that should be a failure. A user can always decide to whitelist to make sure the sanity check passes, but this should be a very conscious decision. Since many of us are building in bulk, semi-automated pipelines, etc, warnings would too easily be missed.
  • whitelisting (e.g. for bundled precompiled stuff). This will cause the sanity check to be skipped (or at most print a warning/info) for software that is whitelisted. It enables a conscious override by a user to say 'yes, I know this binary wasn't build for the requested CC, and I'm ok with that'.
  • Also check for PTX code (and which arch that PTX code is for). We currently don't have any way of asking EasyBuild to build for a certain PTX arch, so a question would be: what do we check against? A logical default would be to check for PTX code for the highest CC in --cuda-compute-capabilities as this would allow forward-compatibility of the binary through JIT compilation.
  • add --strict-ptx-sanity-check (default: no): regular sanity check would fail (raise an error) if not at least the configured virtual architectures are present. It should report surplus CCs (at least with --debug), but not fail. The strict variant will also fail if there are surplus CCs present. => EDIT: Won't do, out of scope, see add a CUDA device code sanity check #4692 (comment)
  • add --cuda-virtual-architectures option to EasyBuild, which can be used to determine for which virtual architecture to compile PTX code. It won't do anything initially until EB contributors start supporting this in their EasyBlocks and/or we get proper NVCC compiler wrappers that could inject such arguments. => EDIT: Won't do, out of scope, see add a CUDA device code sanity check #4692 (comment)

@casparvl
Copy link
Contributor

Ignore list seems to work. Adding

cuda_sanity_ignore_files = ['bin/watershedSegmentationNPP', 'bin/simpleTemplates_nvrtc']

to the EasyConfig for CUDA-Samples results in

== 2025-02-20 23:43:16,229 easyblock.py:3350 DEBUG Sanity checking for CUDA device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1
.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,229 run.py:489 INFO Path to bash that will be used to run shell commands: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
== 2025-02-20 23:43:16,229 run.py:500 INFO Running shell command 'file /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplat
es_nvrtc' in /tmp/casparl/easybuild/build/CUDASamples/12.1/GCC-12.3.0-CUDA-12.1.1/cuda-samples-12.1
== 2025-02-20 23:43:16,235 run.py:598 INFO Output of 'file ...' shell command (stdout + stderr):
/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamica
lly linked, interpreter /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, not stripped

== 2025-02-20 23:43:16,235 run.py:601 INFO Shell command completed successfully (see output above): file /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12
.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,235 run.py:489 INFO Path to bash that will be used to run shell commands: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
== 2025-02-20 23:43:16,235 run.py:500 INFO Running shell command 'cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTe
mplates_nvrtc' in /tmp/casparl/easybuild/build/CUDASamples/12.1/GCC-12.3.0-CUDA-12.1.1/cuda-samples-12.1
== 2025-02-20 23:43:16,240 run.py:598 INFO Output of 'cuobjdump ...' shell command (stdout + stderr):

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit

== 2025-02-20 23:43:16,240 run.py:601 INFO Shell command completed successfully (see output above): cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,241 easyblock.py:3376 WARNING Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/1
2.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0. This failure will be ignored as /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc is listed in 'ignore_cuda_sanity_failures'.
== 2025-02-20 23:43:16,241 easyblock.py:3393 WARNING Configured highest compute capability was '8.0', but no PTX code for this compute capability was found in '/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc' PTX architectures supported in that file: []

and note that this binary does not get listed in the failure message. So that's the intended behavior: the warning is still printed, but it doesn't result in an error.

@casparvl
Copy link
Contributor

Just to test: putting all of these files in the ignore list, the installation of CUDA-Samples now passes.

cuda_sanity_ignore_files = [
    'bin/binomialOptions_nvrtc',
    'bin/jitLto',
    'bin/inlinePTX_nvrtc',
    'bin/conjugateGradientCudaGraphs',
    'bin/simpleVoteIntrinsics_nvrtc',
    'bin/MersenneTwisterGP11213',
    'bin/nvJPEG_encoder',
    'bin/vectorAdd_nvrtc',
    'bin/clock_nvrtc',
    'bin/nvJPEG',
    'bin/BlackScholes_nvrtc',
    'bin/simpleAtomicIntrinsics_nvrtc',
    'bin/batchedLabelMarkersAndLabelCompressionNPP',
    'bin/conjugateGradient',
    'bin/simpleAssert_nvrtc',
    'bin/matrixMul_nvrtc',
    'bin/cuSolverDn_LinearSolver',
    'bin/quasirandomGenerator_nvrtc',
    'bin/watershedSegmentationNPP',
    'bin/simpleTemplates_nvrtc'
]

This provides a nice starting point for further tests, I can easily just remove one from the exclude list, and check that I get the expected result.

@casparvl
Copy link
Contributor

So... the whole thing with checking PTX codes makes me rethink what EasyBuild should do when a certain --cude-device-compute-capabilities is set. Currently, this is ill-defined at best. Our official docs say:

List of CUDA compute capabilities to use when building GPU software;
values should be specified as digits separated by a dot, for example:
3.5,5.0,7.2 (type comma-separated list)

But what does that mean? What do we expect the nvcc compiler to do here? Say we were to compile a simple hello world, and I would do --cuda-compute-capabilies=8.0,9.0, what would I expect my nvcc invocation to look like?

nvcc hello.cu --gpu-architecture=compute_80 --gpu-code=sm_80,sm_90 -o hello

i.e. would it only build device code for 80/90, and not include PTX? And build both through the lowest common virtual architecture? Or should it do

nvcc hello.cu --gpu-architecture=compute_80 --gpu-code=sm_80,sm_90,compute_80 -o hello

i.e. also include the PTX code for the --gpu-architecture we specified? Or do we expect it to use the generalized option --generate-code so that it does

nvcc hello.cu  --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 -o hello

i.e. the stage one compilation is executed once for each CUDA compute capability, so that the generated sm_90 code can actually use the features from the compute_90 architecture? Or do we expect it to do

nvcc hello.cu  --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 --generate-code=arch=compute_90,code=compute_90 -o hello

so that it actually includes not only the device codes for CC80 and CC90, but also the PTX code for CC90 (for forwards compatibility)?

Honestly, from a performance perspective, I think it would be best if EasyBuild would indeed use the generalized arguments, so that the sm_90 code would use the full capabilities of the compute_90 virtual architecture. Since EasyBuild focusses on performance, I think this makes sense. The only price you pay is longer compilation time, since you also have to build that compute_90 virtual architecture PTX code. Whether to include the PTX code is a different question. As proposed above, I think this should be a separate option in EasyBuild, so that one can decide in the EB config whether to ship PTX code, and which version(s).

I.e. my proposal would be that if EasyBuild is configured with --cuda-compute-capabilities=7.0,8.0,9.0 and --cuda-virtual-architectures=7.0,9.0 that this would trigger:

nvcc hello.cu  --generate-code=arch=compute_70,code=sm_70 --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 --generate-code=arch=compute_70,code=compute_70 --generate-code=arch=compute_90,code=compute_90 -o hello

Note that it may not always be possible to convince all build systems to actually do this - e.g. some codes might really only compile for a single cuda-compute-capability, or the build system doesn't make this distinction between real and virtual architectures to build for. Eventually the most robust and generic way to get this done might just be to implement nvcc compiler wrappers that inject these --generate-code arguments.

I'm creating a CUDA hello world EasyConfig that we can use to serve as an easy example of 1) how we think --cuda-compute-capabilities and --cuda-virtual-architectures should work in EasyBuild and 2) have an EasyConfig with which we can easily test these things, including the sanity check.

I'm not sure what the best way forward is. If I include everything in this PR, it may be a bit heavy - although honestly at the framwork level it's just about defining the options, the real implementation would have to be done in EasyBlocks and EasyConfigs that use this information...

My plan is to include the options in this PR, and make an accompanying PR for my CUDA hello world that uses these options in the way described above. The rest is then up to anyone updating or creating new EasyBlocks/EasyConfigs that somehow use information on the CUDA compute capability.

@casparvl
Copy link
Contributor

Ok, change of plans. After thinking it over, this would be a massive scope creep that would delay the sanity check part that we primarily care about in this PR. Instead, in this PR, I'll focus on just that: a sanity check for the CUDA device codes. We can assume that everyone using EasyBuild expect this to be the meaning of the --cuda-compute-capabilities, i.e. they expect if they specify 8.0,9.0 that the resulting binaries contain device code for 8.0 and 9.0. Which virtual architecture was used to get there, or what PTX codes are shipped as part of the binary are not relevant to that expectation, and can be considered further optimizations that we can do in a separate PR.

I will retain the code that prints a warning for the PTX code not matching the highest architecture. Or maybe demote it to an info message. In any case, it's convenient for future reference if EasyBuild extracts this information.

I will not implement a strict option for the PTX code sanity check in this PR. It does not make sense to be sanity checking for behavior that we haven't clearly defined, i.e. there is no clear definition of what PTX code is expected to be included when someone sets --cuda-compute-capabilities.

@casparvl
Copy link
Contributor

Everything not sanity-check related is now described in this issue, which can be used to create one or more follow-up PRs.

…nity check on surpluss CUDA archs if this option is set. Otherwise, print warning
@casparvl
Copy link
Contributor

casparvl commented Feb 21, 2025

Tested by adding

cuda_sanity_ignore_files = [
    'bin/binomialOptions_nvrtc',
    'bin/jitLto',
    'bin/inlinePTX_nvrtc',
    'bin/conjugateGradientCudaGraphs',
    'bin/simpleVoteIntrinsics_nvrtc',
    'bin/MersenneTwisterGP11213',
    'bin/nvJPEG_encoder',
    'bin/vectorAdd_nvrtc',
    'bin/clock_nvrtc',
    'bin/nvJPEG',
    'bin/BlackScholes_nvrtc',
    'bin/simpleAtomicIntrinsics_nvrtc',
    'bin/batchedLabelMarkersAndLabelCompressionNPP',
    # 'bin/conjugateGradient',
    'bin/simpleAssert_nvrtc',
    'bin/matrixMul_nvrtc',
    'bin/cuSolverDn_LinearSolver',
    'bin/quasirandomGenerator_nvrtc',
    'bin/watershedSegmentationNPP',
    'bin/simpleTemplates_nvrtc'
]

To CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb. Then, with:

eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild

my build succeeds whereas with

eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild --strict-cuda-sanity-check

It fails with:

== 2025-02-21 21:41:21,349 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/tools/build_log.py:166 in caller_info): Sanity check failed: Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/conjugateGradient. Surplus compute capabilities: 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9, 9.0.  (at easybuild/easybuild-framework/easybuild/framework/easyblock.py:4010 in _sanity_check_step)

as intended.

Only thing left to do for this PR is tests. Not my strong suit to be honest, but let's see. I guess the tricky thing here is that a true test requires a real CUDA binary, and I'm not sure that's even feasible... To build one, I'd need a CUDA module in the test environment - I'm not sure if we have that. I could try to find a CUDA binary that we could just install (maybe just include a hello-world type of CUDA binary) and test with that... Maybe that's the most feasible option. But I have no clue if we can reasonable include binaries in the repo under the test directory. I have an 800KB hello world binary, that shouldn't be too crazy I guess.

@ocaisa
Copy link
Member

ocaisa commented Feb 21, 2025

What you can do is create a mock cuobjdump script that parrots output, you're only checking that EBcan run the code and paste the output.

@casparvl
Copy link
Contributor

Damn your good, it took me 25 more minutes of looking at other examples to figure out that even if I could ingest a binary, I'd lack the cuobjdump executable. Might indeed as well fake cuobjdump output on a toy build example.

test_report_regexs=[regex])



Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

regex += "device code architectures match those in cuda_compute_capabilities"
self.test_toy_build(extra_args=args, test_report=test_report_fp, raise_error=True
test_report_regexs=[regex])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace




# Test single CUDA compute capability with --cuda-compute-capabilities=8.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many blank lines (6)

write_file(cuobjdump_file, cuobjdump_txt_sm80, append=True)
adjust_permissions(cuobjdump_file, stat.S_IXUSR, add=True) # Make sure our mock cuobjdump is executable
args = ['--cuda-compute-capabilities=8.0']
test_report_fp = os.path.join(self.test_buildpath, 'full_test_report.md')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local variable 'test_report_fp' is assigned to but never used

])

# Section for cuobjdump printing output for sm_90 PTX code
cuobjdump_txt_sm90_ptx = '\n'.join([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local variable 'cuobjdump_txt_sm90_ptx' is assigned to but never used

])

# Section for cuobjdump printing output for sm_80 PTX code
cuobjdump_txt_sm80_ptx = '\n'.join([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local variable 'cuobjdump_txt_sm80_ptx' is assigned to but never used

])

# Section for cuobjdump printing output for sm_90 architecture
cuobjdump_txt_sm90 = '\n'.join([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local variable 'cuobjdump_txt_sm90' is assigned to but never used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants